Implementation of load acquire/store release instructions using load/store operation with dmb operation

ABSTRACT

A system and method are provided for simplifying load acquire and store release semantics that are used in reduced instruction set computing (RISC). Translating the semantics into micro-operations, or low-level instructions used to implement complex machine instructions, can avoid having to implement complicated new memory operations. Using one or more data memory barrier operations in conjunction with load and store operations can provide sufficient ordering as a data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed.

TECHNICAL FIELD

The subject disclosure relates to memory operation ordering in a reducedinstruction set computing environment.

BACKGROUND

In lock free computing, there are two ways in which threads canmanipulate shared memory, they can compete with each other for aresource, or they can pass information co-operatively from one thread toanother. Acquire and release semantics are used to accomplish passinginformation cooperatively from one thread to another. Acquire andrelease semantics provide a structural system for ensuring that memoryoperations are ordered correctly to avoid errors. Store releaseinstructions ensure that all previous instructions are completed, andload-acquire instructions ensure that all following instructions willcomplete only after it completes. Accordingly, to properly order memoryoperations using acquire and release semantics, complex combinations ofstore release and load acquire instructions are necessary.

The above-described description is merely intended to provide acontextual overview of current techniques for performing memoryoperation ordering and is not intended to be exhaustive.

SUMMARY

The following presents a simplified summary in order to provide a basicunderstanding of some aspects described herein. This summary is not anextensive overview of the disclosed subject matter. It is intended toneither identify key nor critical elements of the disclosure nordelineate the scope thereof. Its sole purpose is to present someconcepts in a simplified form as a prelude to the more detaileddescription that is presented later.

A system and method are provided for simplifying load acquire and storerelease semantics that are used in reduced instruction set computing(RISC). Various embodiments also provide for ordering memory operationswith respect to the instructions disclosed herein. A typical load withacquire instruction only requires that memory operations after the loadwith acquire are ordered after the load with acquire—it does not imposeany order on the instructions before the load with acquire (both withrespect to the load with acquire and to the subsequent instructions). Inan embodiment of the disclosure however, a load with acquire comprises adata memory barrier that is used in conjunction with a load operationwhich guarantees that all accesses prior to and including the load withacquire are ordered before all access from instructions after the loadwith acquire.

Similarly, traditional store with release instructions impose orderingbetween the access from the store with release and the accesses of allprior instructions (but not subsequent instructions). In an embodimentof the disclosure, however, a data memory barrier at the beginning ofthe store with release provides a strong ordering between prior accessand the access associated with the store with release.

In an example embodiment, a system comprises a processor that executescomputer-executable instructions to perform operations. The instructionscan include a load with acquire instruction that performs memoryoperation ordering, wherein the load with acquire instruction comprisesa load operation followed by a data memory barrier operation.

In another example embodiment, a method comprises executing instructionsin a processor. The method can include a load with acquire instructionfor performing memory operation ordering, wherein the executing the loadwith acquire instruction comprises executing a load operation followedby a data memory barrier operation.

In an example embodiment, a system comprises a processor that executescomputer-executable instructions to perform operations. The instructionscan include a store with release instruction that performs memoryoperation ordering, wherein the store with release instruction comprisea first data memory barrier operation followed by a store operationfollowed by a second data memory barrier operation.

In an example embodiment, a method comprises executing instructions in aprocessor. The method can include a store with release instruction forperforming memory operation ordering, wherein the executing the storewith release instruction comprises executing a first data memory barrieroperation followed by executing a store operation followed by executinga second data memory barrier operation.

The following description and the annexed drawings set forth in detailcertain illustrative aspects of the subject disclosure. These aspectsare indicative, however, of but a few of the various ways in which theprinciples of various disclosed aspects can be employed and thedisclosure is intended to include all such aspects and theirequivalents. Other advantages and novel features will become apparentfrom the following detailed description when considered in conjunctionwith the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. Us a block diagram illustrating an example, non-limiting embodimentof a system that filters memory operations in accordance with variousaspects described herein.

FIG. 2 is a block diagram illustrating an example, non-limitingembodiment of a system that filters memory operations in accordance withvarious aspects described herein.

FIG. 3 is a block diagram illustrating an example, non-limitingembodiment of a system that filters memory operations in accordance withvarious aspects described herein.

FIG. 4 is a block diagram illustrating an example, non-limitingembodiment of a system that filters memory operations in accordance withvarious aspects described herein.

FIG. 5 illustrates a flow diagram of an example, non-limiting embodimentof a method for executing a load with acquire instruction.

FIG. 6 illustrates a flow diagram of an example, non-limiting embodimentof a method for executing a store with release instruction.

FIG. 7 illustrates a flow diagram of an example, non-limiting embodimentof a method for filtering memory operations using a data memory barrier.

FIG. 8 illustrates a block diagram of an example electronic computingenvironment that can be implemented in conjunction with one or moreaspects described herein.

DETAILED DESCRIPTION

The disclosure herein is described with reference to the drawings,wherein like reference numerals are used to refer to like elementsthroughout. In the following description, for purposes of explanation,numerous specific details are set forth in order to provide a thoroughunderstanding of the subject innovation. It may be evident, however,that various disclosed aspects can be practiced without these specificdetails. In other instances, well-known structures and devices are shownin block diagram form in order to facilitate describing the subjectinnovation.

Various embodiments provide for a system that simplifies load acquireand store release semantics that are used in reduced instruction setcomputing (RISC). In lock free computing, there are two ways in whichthreads can manipulate shared memory, they can compete with each otherfor a resource, or they can pass information co-operatively from onethread to another. These semantics are complex however, and replacingthe specialized semantics with simple data memory barriers can simplifythe process of memory ordering. Translating the semantics intomicro-operations, or low-level instructions used to implement complexmachine instructions, can avoid having to implement complicated newmemory operations. Using a data memory barrier in conjunction with loadand store instructions can provide sufficient ordering using simplebrute force ordering operations.

As used in this disclosure, the terms “instruction”, “operation”, and“access” refer to separate processes and are not interchangeable. Aninstruction is composed of one or more operations, while an operationmay include zero or more memory accesses or barriers. By way of example,a load with acquire instruction creates two operations (a loadoperations and a barrier operation). This barrier splits all memoryaccesses into two groups. The first group comprises accesses from allinstructions prior to the load with acquire as well as the access fromthe load operation that belongs to the load with acquire. The secondgroup comprises accesses from all instructions after the load withacquire instruction.

Turning now to the illustrations. FIG. 1 illustrates a system 100 thatfilters memory operations using a data memory barrier in a RISCprocessor, processing environment, or architecture. The RISC processorcan include variations of ARM processors, and specifically, in thisembodiment, an ARMv8 processor. As illustrated, system 100 can includeload/store component 102 that can be communicatively coupled and/oroperationally coupled to processor 104 for facilitating operation and/orexecution of computer executable instructions and/or components bysystem 100, memory 106 for storing data and/or computer executableinstructions and/or components for execution by system 100 utilizingprocessor 104, for instance, and storage component 108 for providinglonger term storage for data and/or computer executable instructionsand/or components that can be executed by system 100 using processor104, for example. Additionally, and as depicted, system 100 can receiveinput 110 that can be transformed by execution of one or more computerexecutable instructions and/or components, by the processor 104, from afirst state to a second state, wherein the first state can bedistinguished and/or is discernible and/or is different from the secondstate. System 100 can also produce output 112 that can include anarticle that has been transformed, through processing by system 100,into a different state or thing.

Turning now to FIG. 2, illustrated is a block diagram of an example,non-limiting embodiment of a system that filters memory operations inaccordance with various aspects described herein. System 200 includes adata memory barrier 204 that enforces an ordering constraint on priorinstructions 202 and subsequent instructions 206. The data memorybarrier 204 is a type of barrier operation which causes a CPU orcompiler to enforce an ordering constraint on memory operations issuedbefore and after the barrier operation. The typically means that certainoperations are guaranteed to be performed before the barrier, and othersafter. Data memory barrier 204 ensures that prior instructions 202 areperformed and completed before subsequent instructions 206 are executed.Prior instructions 202 and subsequent instructions 206 can each includevarious combinations of basic load and store instructions plus morecomplex variants of these instructions (e.g., load-exclusive withacquire, store-exclusive with release, and etc).

In an embodiment, the prior instructions 202 and subsequent instructions206 can comprise load or store instructions that are configured forloading a first set of data from a memory and storing a second set ofdata to the memory. The data memory barrier 204 can be configured forordering the memory operations associated with loading and storing thedata, wherein the type of ordering accomplished is based on the positionin a program order of the data memory relative to the one or more loadinstructions and store instructions.

Turning now to FIG. 3, a block diagram illustrating an example,non-limiting embodiment of a system that filters memory operations via aload with acquire instruction in accordance with various aspectsdescribed herein is shown. System 300 can include a data memory barrier304 that orders load operation 302 that precedes the data memory barrier304 in a program order. Data memory barrier 304 ensures that loadoperation 302 is performed and completed before subsequent instructionsare executed. System 300 shows a simple load with acquire instructionthat comprises a load operation and a data memory barrier operation. Inother embodiments, other types of load operations can result indifferent load instructions, such as load exclusive with acquire andother variants.

Turning now to FIG. 4, illustrated is an example, non-limitingembodiment of a system that performs a store with release instruction inaccordance with various aspects described herein is shown. System 400can include data memory barriers 402 and 406 on either side of a storeoperation 404 in a program order. Data memory barrier 402 ensures thatall prior instructions/operations have ceased before store operation 404is initiated, while data memory barrier 406 ensures that store operation404 is completed before any subsequent memory instructions/operationsoccur. In addition, the first data memory barrier 402 and the seconddata memory barrier 406 also create an ordering to ensure that storewith release and load with acquire instructions are observed in programorder.

In view of the example systems described above, methods that may beimplemented in accordance with the described subject matter may bebetter appreciated with reference to the flow charts of FIGS. 5-7. Whilefor purposes of simplicity, the methods are shown and described as aseries of blocks, it is to be understood and appreciated that theclaimed subject matter is not limited by the order of the blocks, assome blocks may occur in different orders and/or concurrently with otherblocks from what is depicted and described herein. Moreover, not allillustrated blocks may be required to implement the methods describedhereinafter.

Referring now to FIG. 5, illustrated is a flow diagram of an example,non-limiting embodiment of a method for executing a load with acquireinstruction. Methodology 500 can start at 502, where a load operation isexecuted, wherein the load operation specifies an address for accessinga data from a memory.

At 504, a data memory barrier can be executed. The data memory barrieris a type of barrier operation which causes a CPU or compiler to enforcean ordering constraint on memory operations issued before and after thebarrier instruction. The typically means that certain operations areguaranteed to be performed before the barrier, and others after. Datamemory barrier ensures that prior instructions are performed andcompleted before subsequent instructions are executed. In this instance,the data memory barrier operation ensures that the prior load operationis performed and completed before subsequent instructions are executed.

Turning now to FIG. 6, illustrated is a flow diagram of an example,non-limiting embodiment of a method for executing a store with releaseinstruction. Methodology 600 can start at 602, where a first data memorybarrier operation is executed. The data memory barrier is a type ofbarrier instruction which causes a CPU or compiler to enforce anordering constraint on memory operations issued before and after thebarrier instruction.

At 604, a store operation is executed. The store operation specifies anaddress for writing data to memory. At 606, a second data memory barrieroperation is executed. Having a store operation between two data memorybarrier operations ensures that all other memory operations have beenperformed and are completed before the store operation is executed, andthen no other memory operations are allowed until the store operation iscompleted. In this way, the store with release instruction performedmemory operation ordering using simple store and data memory barrieroperations.

Turning now to FIG. 7, a flow diagram of an example, non-limitingembodiment of a method for filtering memory operations using a datamemory barrier o. Methodology 700 can start at 702, where a first set ofmemory operations are executed before a barrier. The barrier ensuresthat all instructions are completed before step 704, where a second setof memory operations are executed after the data memory barrier.

Example Computing Environment

As mentioned, advantageously, the techniques described herein can beapplied to any reduced instruction set computing environment where it isdesirable to perform memory operation ordering or filtering.. It is tobe understood, therefore, that handheld, portable and other computingdevices and computing objects of all kinds are contemplated for use inconnection with the various non-limiting embodiments, i.e., anywherethat memory operation ordering may be performed. Accordingly, the belowgeneral purpose remote computer described below in FIG. 8 is but oneexample, and the disclosed subject matter can be implemented with anyclient having network/bus interoperability and interaction. Thus, thedisclosed subject matter can be implemented on chips or systems in anenvironment of networked hosted services in which very little or minimalclient resources are implicated, e.g., a networked environment in whichthe client device serves merely as an interface to the network/bus, suchas an object placed in an appliance.

Although not required, some aspects of the disclosed subject matter canpartly be implemented via an operating system, for use by a developer ofservices for a device or object, and/or included within applicationsoftware that operates in connection with the component(s) of thedisclosed subject matter. Software may be described in the generalcontext of computer executable instructions, such as program modules orcomponents, being executed by one or more computer(s), such asprojection display devices, viewing devices, or other devices. Thoseskilled in the art will appreciate that the disclosed subject matter maybe practiced with other computer system configurations and protocols.

FIG. 8 thus illustrates an example of a suitable computing systemenvironment 800 in which some aspects of the disclosed subject mattercan be implemented, although as made clear above, the computing systemenvironment 800 is only one example of a suitable computing environmentfor a device and is not intended to suggest any limitation as to thescope of use or functionality of the disclosed subject matter. Neithershould the computing environment 800 be interpreted as having anydependency or requirement relating to any one or combination ofcomponents illustrated in the exemplary operating environment 800.

With reference to FIG. 8, an exemplary device for implementing thedisclosed subject matter includes a general-purpose computing device inthe form of a computer 810. Components of computer 810 may include, butare not limited to, a processing unit 820, a system memory 830, and asystem bus 821 that couples various system components including thesystem memory to the processing unit 820. The system bus 821 may be anyof several types of bus structures including a memory bus or memorycontroller, a peripheral bus, and a local bus using any of a variety ofbus architectures.

Computer 810 typically includes a variety of computer readable media.Computer readable media can be any available media that can be accessedby computer 810. By way of example, and not limitation, computerreadable media can comprise computer storage media and communicationmedia. Computer storage media includes volatile and nonvolatile,removable and non-removable media implemented in any method ortechnology for storage of information such as computer readableinstructions, data structures, program modules or other data. Computerstorage media includes, but is not limited to, RAM, ROM, EEPROM, flashmemory or other memory technology, CDROM, digital versatile disks (DVD)or other optical disk storage, magnetic cassettes, magnetic tape,magnetic disk storage or other magnetic storage devices, or any othermedium which can be used to store the desired information and which canbe accessed by computer 810. Communication media typically embodiescomputer readable instructions, data structures, program modules, orother data in a modulated data signal such as a carrier wave or othertransport mechanism and includes any information delivery media.

The system memory 830 may include computer storage media in the form ofvolatile and/or nonvolatile memory such as read only memory (ROM) and/orrandom access memory (RAM). A basic input/output system (BIOS),containing the basic routines that help to transfer information betweenelements within computer 810, such as during start-up, may be stored inmemory 830. Memory 830 typically also contains data and/or programmodules that are immediately accessible to and/or presently beingoperated on by processing unit 820. By way of example, and notlimitation, memory 830 may also include an operating system, applicationprograms, other program modules, and program data.

The computer 810 may also include other removable/non-removable,volatile/nonvolatile computer storage media. For example, computer 810could include a hard disk drive that reads from or writes tonon-removable, nonvolatile magnetic media, a magnetic disk drive thatreads from or writes to a removable, nonvolatile magnetic disk, and/oran optical disk drive that reads from or writes to a removable,nonvolatile optical disk, such as a CD-ROM or other optical media. Otherremovable/non-removable, volatile/nonvolatile computer storage mediathat can be used in the exemplary operating environment include, but arenot limited to, magnetic tape cassettes, flash memory cards, digitalversatile disks, digital video tape, solid state RAM, solid state ROM,and the like. A hard disk drive is typically connected to the system bus821 through a non-removable memory interface such as an interface, and amagnetic disk drive or optical disk drive is typically connected to thesystem bus 821 by a removable memory interface, such as an interface.

A user can enter commands and information into the computer 810 throughinput devices such as a keyboard and pointing device, commonly referredto as a mouse, trackball, or touch pad. Other input devices can includea microphone, joystick, game pad, satellite dish, scanner, wirelessdevice keypad, voice commands, or the like. These and other inputdevices are often connected to the processing unit 820 through userinput 840 and associated interface(s) that are coupled to the system bus821, but may be connected by other interface and bus structures, such asa parallel port, game port, or a universal serial bus (USB). A graphicssubsystem can also be connected to the system bus 821. A projection unitin a projection display device, or a HUD in a viewing device or othertype of display device can also be connected to the system bus 821 viaan interface, such as output interface 850, which may in turncommunicate with video memory. In addition to a monitor, computers canalso include other peripheral output devices such as speakers which canbe connected through output interface 850.

The computer 810 can operate in a networked or distributed environmentusing logical connections to one or more other remote computer(s), suchas remote computer 870, which can in turn have media capabilitiesdifferent from device 810. The remote computer 870 can be a personalcomputer, a server, a router, a network PC, a peer device, personaldigital assistant (PDA), cell phone, handheld computing device, aprojection display device, a viewing device, or other common networknode, or any other remote media consumption or transmission device, andmay include any or all of the elements described above relative to thecomputer 810. The logical connections depicted in FIG. 8 include anetwork 871, such local area network (LAN) or a wide area network (WAN),but can also include other networks/buses, either wired or wireless.Such networking environments are commonplace in homes, offices,enterprise-wide computer networks, intranets and the Internet.

When used in a LAN networking environment, the computer 810 can beconnected to the LAN 871 through a network interface or adapter. Whenused in a WAN networking environment, the computer 810 can typicallyinclude a communications component, such as a modem, or other means forestablishing communications over the WAN, such as the Internet. Acommunications component, such as wireless communications component, amodem and so on, which can be internal or external, can be connected tothe system bus 821 via the user input interface of input 840, or otherappropriate mechanism. In a networked, environment, program modulesdepicted relative to the computer 810, or portions thereof, can bestored in a remote memory storage device. It will be appreciated thatthe network connections shown and described are exemplary and othermeans of establishing a communications link between the computers can beused.

Reference throughout this specification to “one embodiment,” “anembodiment,” “a disclosed aspect,” or “an aspect” means that aparticular feature, structure, or characteristic described in connectionwith the embodiment or aspect is included in at least one embodiment oraspect of the present disclosure. Thus, the appearances of the phrase“in one embodiment,” “in one aspect,” or “in an embodiment,” in variousplaces throughout this specification are not necessarily all referringto the same embodiment. Furthermore, the particular features,structures, or characteristics may be combined in any suitable manner invarious disclosed embodiments.

As utilized herein, NAND and NOR memory refer to two types of flashmemory based on the NAND and NOR logic gates that they respectively use.The NAND type is primarily used in main memory memory cards, USB flashdrives, solid-state drives, and similar products, for general storageand transfer of data. The NOR type, which allows true random access andtherefore direct code execution, is used as a replacement for the olderEPROM and as an alternative to certain kinds of ROM applications.However, NOR flash memory can emulate ROM primarily at the machine codelevel; many digital designs need ROM (or PLA) structures for other uses,often at significantly higher speeds than (economical) flash memory mayachieve. NAND or NOR flash memory is also often used to storeconfiguration data in numerous digital products, a task previously madepossible by EEPROMs or battery-powered static RAM.

As utilized herein, terms “component,” “system,” “architecture” and thelike are intended to refer to a computer or electronic-related entity,either hardware, a combination of hardware and software, software (e.g.,in execution), or firmware. For example, a component can be one or moretransistors, a memory cell, an arrangement of transistors or memorycells, a gate array, a programmable gate array, an application specificintegrated circuit, a controller, a processor, a process running on theprocessor, an object, executable, program or application accessing orinterfacing with semiconductor memory, a computer, or the like, or asuitable combination thereof. The component can include erasableprogramming (e.g., process instructions at least in part stored inerasable memory) or hard programming (e.g., process instructions burnedinto non-erasable memory at manufacture).

By way of illustration, both a process executed from memory and theprocessor can be a component. As another example, an architecture caninclude an arrangement of electronic hardware (e.g., parallel or serialtransistors), processing instructions and a processor, which implementthe processing instructions in a manner suitable to the arrangement ofelectronic hardware. In addition, an architecture can include a singlecomponent (e.g., a transistor, a gate array, . . . ) or an arrangementof components (e.g., a series or parallel arrangement of transistors, agate array connected with program circuitry, power leads, electricalground, input signal lines and output signal lines, and so on). A systemcan include one or more components as well as one or more architectures.One example system can include a switching block architecture comprisingcrossed input/output lines and pass gate transistors, as well as powersource(s), signal generator(s), communication bus(ses), controllers, I/Ointerface, address registers, and so on. It is to be appreciated thatsome overlap in definitions is anticipated, and an architecture or asystem can be a stand-alone component, or a component of anotherarchitecture, system, etc.

In addition to the foregoing, the disclosed subject matter can beimplemented as a method, apparatus, or article of manufacture usingtypical manufacturing, programming or engineering techniques to producehardware, firmware, software, or any suitable combination thereof tocontrol an electronic device to implement the disclosed subject matter.The terms “apparatus” and “article of manufacture” where used herein areintended to encompass an electronic device, a semiconductor device, acomputer, or a computer program accessible from any computer-readabledevice, carrier, or media. Computer-readable media can include hardwaremedia, or software media. In addition, the media can includenon-transitory media, or transport media. In one example, non-transitorymedia can include computer readable hardware media. Specific examples ofcomputer readable hardware media can include but are not limited tomagnetic storage devices (e.g., hard disk, floppy disk, magnetic strips. . . ), optical disks (e.g., compact disk (CD), digital versatile disk(DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick,key drive . . . ). Computer-readable transport media can include carrierwaves, or the like. Of course, those skilled in the art will recognizemany modifications can be made to this configuration without departingfrom the scope or spirit of the disclosed subject matter.

What has been described above includes examples of the subjectinnovation. It is, of course, not possible to describe every conceivablecombination of components or methodologies for purposes of describingthe subject innovation, but one of ordinary skill in the art canrecognize that many further combinations and permutations of the subjectinnovation are possible. Accordingly, the disclosed subject matter isintended to embrace all such alterations, modifications and variationsthat fall within the spirit and scope of the disclosure. Furthermore, tothe extent that a term “includes”, “including”, “has” or “having” andvariants thereof is used in either the detailed description or theclaims, such term is intended to be inclusive in a manner similar to theterm “comprising” as “comprising” is interpreted when employed as atransitional word in a claim.

Moreover, the word “exemplary” is used herein to mean serving as anexample, instance, or illustration. Any aspect or design describedherein as “exemplary” is not necessarily to be construed as preferred oradvantageous over other aspects or designs. Rather, use of the wordexemplary is intended to present concepts in a concrete fashion. As usedin this application, the term “or” is intended to mean an inclusive “or”rather than an exclusive “or”. That is, unless specified otherwise, orclear from context, “X employs A or B” is intended to mean any of thenatural inclusive permutations. That is, if X employs A; X employs B; orX employs both A and B, then “X employs A or B” is satisfied under anyof the foregoing instances. In addition, the articles “a” and “an” asused in this application and the appended claims should generally beconstrued to mean “one or more” unless specified otherwise or clear fromcontext to be directed to a singular form.

Additionally, some portions of the detailed description have beenpresented in terms of algorithms or process operations on data bitswithin electronic memory. These process descriptions or representationsare mechanisms employed by those cognizant in the art to effectivelyconvey the substance of their work to others equally skilled. A processis here, generally, conceived to be a self-consistent sequence of actsleading to a desired result. The acts are those requiring physicalmanipulations of physical quantities. Typically, though not necessarily,these quantities take the form of electrical and/or magnetic signalscapable of being stored, transferred, combined, compared, and/orotherwise manipulated.

It has proven convenient, principally for reasons of common usage, torefer to these signals as bits, values, elements, symbols, characters,terms, numbers, or the like. It should be borne in mind, however, thatall of these and similar terms are to be associated with the appropriatephysical quantities and are merely convenient labels applied to thesequantities. Unless specifically stated otherwise or apparent from theforegoing discussion, it is appreciated that throughout the disclosedsubject matter, discussions utilizing terms such as processing,computing, calculating, determining, or displaying, and the like, referto the action and processes of processing systems, and/or similarconsumer or industrial electronic devices or machines, that manipulateor transform data represented as physical (electrical and/or electronic)quantities within the registers or memories of the electronic device(s),into other data similarly represented as physical quantities within themachine and/or computer system memories or registers or other suchinformation storage, transmission and/or display devices.

In regard to the various functions performed by the above describedcomponents, architectures, circuits, processes and the like, the terms(including a reference to a “means”) used to describe such componentsare intended to correspond, unless otherwise indicated, to any componentwhich performs the specified function of the described component (e.g.,a functional equivalent), even though not structurally equivalent to thedisclosed structure, which performs the function in the hereinillustrated exemplary aspects of the embodiments. In addition, while aparticular feature may have been disclosed with respect to only one ofseveral implementations, such feature may be combined with one or moreother features of the other implementations as may be desired andadvantageous for any given or particular application. It will also berecognized that the embodiments include a system as well as acomputer-readable medium having computer-executable instructions forperforming the acts and/or events of the various processes.

Other than where otherwise indicated, all numbers, values and/orexpressions referring to quantities of items such as memory size, etc.,used in the specification and claims are to be understood as modified inall instances by the term “about.”

What is claimed is:
 1. A processor that executes computer-executableinstructions to perform operations, the instructions comprising: a loadwith acquire instruction that performs memory operation ordering,wherein the load with acquire instruction comprises a load operationfollowed by a data memory barrier operation.
 2. The processor of claim1, wherein the processor is an ARMv8 processor.
 3. The processor ofclaim 1, wherein the data memory barrier operation orders memoryoperations comprising a first set of memory operations occurring beforethe barrier operation, and a second set of memory operations occurringafter the barrier operation.
 4. The processor of claim 1, wherein theload operation specifies an address for accessing a first data from thememory.
 5. The processor of claim 1, wherein the load with acquireinstruction comprises at least one of a plurality of types of load withacquire instructions.
 6. The processor of claim 1, wherein the datamemory barrier operation replaces a set of load acquire semantics formemory operation ordering.
 7. A method for executing instructions in aprocessor, comprising: executing a load with acquire instruction forperforming memory operation ordering, wherein the executing the loadwith acquire instruction comprises executing a load operation followedby a data memory barrier operation.
 8. The method of claim 7, furthercomprising executing the instructions on an ARMv8 processor.
 9. Themethod of claim 7, further comprising executing a plurality of types ofload with acquire instructions.
 10. The method of claim 7, whereinexecuting the data memory barrier operation replaces a set of loadacquire semantics for memory operation ordering.
 11. The method of claim7, wherein the load operation specifies an address for accessing a firstdata from the memory.
 12. The method of claim 7, wherein the data memorybarrier operation orders memory operations comprising a first set ofmemory operations occurring before the barrier operation, and a secondset of memory operations occurring after the barrier operation.
 13. Aprocessor that executes computer-executable instructions to performoperations, the instructions comprising: a store with releaseinstruction that performs memory operation ordering, wherein the storewith release instruction comprise a first data memory barrier operationfollowed by a store operation followed by a second data memory barrieroperation.
 14. The processor of claim 13, wherein the processor is anARMv8 processor.
 15. The processor of claim 13, wherein the first andsecond data memory barrier operations order memory operations comprisinga first set of memory operations occurring before the barrieroperations, and a second set of memory operations occurring after thebarrier operations.
 16. The processor of claim 13, wherein the storeoperation specifies an address for writing a first data to memory. 17.The processor of claim 13, wherein the instructions further comprise aplurality of types of store with release instructions.
 18. The processorof claim 13, wherein the second data memory barrier operation ensuresthat a following load with acquire instruction is observed in a programorder.
 19. The processor of claim 13, wherein the data memory barrieroperations replaces a set of store release semantics for memoryoperation ordering.
 20. A method for executing instructions in aprocessor, comprising: executing a store with release instruction forperforming memory operation ordering, wherein executing the store withrelease instruction comprises executing a first data memory barrieroperation followed by executing a store operation followed by executinga second data memory barrier operation.
 21. The method of claim 20,further comprising executing the store with release instruction on anARMv8 processor.
 22. The method of claim 20, further comprisingexecuting a plurality of types of store with release instructions. 23.The method of claim 20, wherein executing the data memory barrieroperations replaces a set of store release semantics for memoryoperation ordering.
 24. The method of claim 20, wherein executing thefirst and second data memory barrier operations order memory operationscomprising a first set of memory accesses occurring before the barrieroperations, and a second set of memory accesses occurring after thebarrier operations.
 25. The method of claim 20, wherein executing thestore operations specifies an address for writing a first data tomemory.
 26. The method of claim 20, wherein the executing the seconddata memory barrier operation before executing a load with acquireinstruction ensures the instructions are observed in a program order.